25 research outputs found

    Multi-objective evolution for Generalizable Policy Gradient Algorithms

    Full text link
    Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges relevant to many practical applications in which they present themselves in combination. Still, state-of-the-art RL algorithms fall short when addressing multiple RL objectives simultaneously and current human-driven design practices might not be well-suited for multi-objective RL. In this paper we present MetaPG, an evolutionary method that discovers new RL algorithms represented as graphs, following a multi-objective search criteria in which different RL objectives are encoded in separate fitness scores. Our findings show that, when using a graph-based implementation of Soft Actor-Critic (SAC) to initialize the population, our method is able to find new algorithms that improve upon SAC's performance and generalizability by 3% and 17%, respectively, and reduce instability up to 65%. In addition, we analyze the graph structure of the best algorithms in the population and offer an interpretation of specific elements that help trading performance for generalizability and vice versa. We validate our findings in three different continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.Comment: 23 pages, 12 figures, 10 table

    Evolving Reinforcement Learning Algorithms

    Full text link
    We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.Comment: ICLR 2021 Oral. See project website at https://sites.google.com/view/evolvingr

    Discovering Representations for Black-box Optimization

    Full text link
    The encoding of solutions in black-box optimization is a delicate, handcrafted balance between expressiveness and domain knowledge -- between exploring a wide variety of solutions, and ensuring that those solutions are useful. Our main insight is that this process can be automated by generating a dataset of high-performing solutions with a quality diversity algorithm (here, MAP-Elites), then learning a representation with a generative model (here, a Variational Autoencoder) from that dataset. Our second insight is that this representation can be used to scale quality diversity optimization to higher dimensions -- but only if we carefully mix solutions generated with the learned representation and those generated with traditional variation operators. We demonstrate these capabilities by learning an low-dimensional encoding for the inverse kinematics of a thousand joint planar arm. The results show that learned representations make it possible to solve high-dimensional problems with orders of magnitude fewer evaluations than the standard MAP-Elites, and that, once solved, the produced encoding can be used for rapid optimization of novel, but similar, tasks. The presented techniques not only scale up quality diversity algorithms to high dimensions, but show that black-box optimization encodings can be automatically learned, rather than hand designed.Comment: Presented at GECCO 2020 -- v2 (Previous title 'Automating Representation Discovery with MAP-Elites'

    Small-scale proxies for large-scale Transformer training instabilities

    Full text link
    Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the ΞΌ\muParam (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms

    Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

    Full text link
    Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents

    Estimation of the national disease burden of influenza-associated severe acute respiratory illness in Kenya and Guatemala : a novel methodology

    Get PDF
    Background: Knowing the national disease burden of severe influenza in low-income countries can inform policy decisions around influenza treatment and prevention. We present a novel methodology using locally generated data for estimating this burden. Methods and Findings: This method begins with calculating the hospitalized severe acute respiratory illness (SARI) incidence for children <5 years old and persons β‰₯5 years old from population-based surveillance in one province. This base rate of SARI is then adjusted for each province based on the prevalence of risk factors and healthcare-seeking behavior. The percentage of SARI with influenza virus detected is determined from provincial-level sentinel surveillance and applied to the adjusted provincial rates of hospitalized SARI. Healthcare-seeking data from healthcare utilization surveys is used to estimate non-hospitalized influenza-associated SARI. Rates of hospitalized and non-hospitalized influenza-associated SARI are applied to census data to calculate the national number of cases. The method was field-tested in Kenya, and validated in Guatemala, using data from August 2009–July 2011. In Kenya (2009 population 38.6 million persons), the annual number of hospitalized influenza-associated SARI cases ranged from 17,129–27,659 for children <5 years old (2.9–4.7 per 1,000 persons) and 6,882–7,836 for persons β‰₯5 years old (0.21–0.24 per 1,000 persons), depending on year and base rate used. In Guatemala (2011 population 14.7 million persons), the annual number of hospitalized cases of influenza-associated pneumonia ranged from 1,065–2,259 (0.5–1.0 per 1,000 persons) among children <5 years old and 779–2,252 cases (0.1–0.2 per 1,000 persons) for persons β‰₯5 years old, depending on year and base rate used. In both countries, the number of non-hospitalized influenza-associated cases was several-fold higher than the hospitalized cases. Conclusions: Influenza virus was associated with a substantial amount of severe disease in Kenya and Guatemala. This method can be performed in most low and lower-middle income countries
    corecore